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Abstract 

Domain-specific knowledge is required to create 
specifications, generate code, and understand existing 
. systems. Our approach to automating software design is 
based on instantiating an application domain model with 
\ industry-specific knowledge and then using that model to 
; achieve the operational goals of specification elicitation 
and verification, reverse engineering, and code generation. 
Although many different specification models can be 
created from any particular domain model, each 
specification model is consistent and correct with respect to 
the domain model. 

Introduction 

Although empirical field studies (Curtis, el al., 1988) 
have shown that application domain knowledge is critical 
to the success of large projects, this knowledge is rarely 
stored in a form which facilitates its use in creating, 
maintaining and evolving software systems. Capturing and 
managing this knowledge is a prerequisite to automating 
software design. 

Unfortunately, domain knowledge is implicitly 
embodied in application code rather than explicitly 
recorded and maintained in separate documents. Even 
when documents are maintained separately from the code, 
the knowledge is stored in voluminous natural language 
documents in an informal rather than a formal manner. 
Although problem-specific languages are designed to 
remedy this situation, domain-specific knowledge is still 
captured in an ad hoc instead of a systematic manner. 
Furthermore, these languages are generally not designed in 
such a way that the results can be generalized or even 
replicated. 

We are attempting to capture the domain-specific 
knowledge about different industry areas as a set of 
application domain models. Application domain models 
are representations of relevant aspects of application 
domains that can be used to achieve specific software 
engineering operational goals. Operational goals are 
always implicit in the construction of a domain model and 

* An earlier version of this paper was presented at the 
Asilomar Workshop on Change of Representation and 
Problem Reformulation, April 1992. 


are essential to understanding die form and content of that 
model. Unlike generalized knowledge representation 
projects such as Cyc (Lenat, 1990) that attempt to provide a 
basis for modeling encyclopedic knowledge, domain 
modeling explicitly acknowledges the commonly held view 
(Amarel, 1968) that representations are designed for 
particular purposes. These purposes-the operational goals- 
inherently bias any particular solution and dictate die final 
form of the model. 

Many different operational goals and modeling projects 
are being pursued within the field of domain modeling 
(Iscoe, et al., 1991). This paper begins with an overview of 
the domain modeling research at EDS and our 
corresponding operational goals. We explain our approach 
to automating software design as a paradigm which 
facilitates the creation of multiple-specification models 
from a domain model. Finally, we discuss a set of issues 
that we have encountered in achieving our goals. 

Programming-in-the-Large 

EDS produces large software systems for a variety of 
industries such as utilities, finance, health insurance, and so 
on. Associated with each industry area is a rich body of 
knowledge which is critical to specifying and 
implementing the proper software system. This knowledge 
includes legal, financial, technical, and other expertise 
which is acquired by personnel over a period of years. 
EDS is organized into strategic business units (SBUs) so 
that the organization’s knowledge about a particular 
industry can be leveraged through reuse. 

At the EDS Austin research laboratory, we are building a 
domain modeling system which is designed to achieve the 
following operational goals: 

• Requirements & Specifications — Eliciting, verifying, 
and formalizing software requirements and specifications, 

• Program Transformation/Generation — Transforming a 
specification into efficient executable code, 

• Reverse Engineering — Identifying the semantics of 
existing code in terms of a partial specification. 

The realization of these operational goals is consistent 
with our long-term plan for creating knowledge-based tools 
to support programming-in-the-large (Barstow, 1988). The 
domain modeling approach provides ample opportunities 
for creating an automated software development paradigm. 
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Figure 1 illustrates the context in which we operate. The 
industry knowledge for each SBU is instantiated into a 
domain model, which then serves as a source of knowledge 
for programs (the ovals) to achieve operational goals, such 
as reverse engineering source code or eliciting system 
specifications. The figure actually illustrates two different 
processes. The left side of figure 1 shows die process of 
domain model instantiation while die right side illustrates 
the domain model being used to produce a single 
specification. The System Specification (rectangle) 
illustrates a specification for a single specific system within 
an application domain, However, a multitude of system 
specifications can be created from a domain model. 



Figure 2 — Instantiating Specification Models 


Figure 2 illustrates the two separate modeling tasks 
required by our approach. Domain experts interact with a 
system to represent their knowledge in terms of domain 
modeling constructs. Specification designers then use the 
system to build specification models which satisfy 
constraints in the domain model. In order to create a 
system specification, die application designer selects a set 
of relevant policies and constraints from the domain model 
that must be included and enforced in the specification 
model. The constraints include intra-attribute as well as 
inter-attribute relationships within and across classes 
relevant to the task at hand. 

Because one of our goals is to generate executable code, 
we require that a particular specification model be 
consistent. A very large but finite number of specification 
models can be created which are consistent and correct 
widi respect to a particular domain model. 

Reverse Engineering 

We arc using reverse engineering to help instantiate bodi 
domain and specification models. Figure 1 illustrates how 
application domain knowledge and programming 
knowledge are used to extract partial specifications from 
source code. The box labeled “programming knowledge" 
currently represents knowledge of COBOL syntax, coding 
conventions, and program plans and structures (Van Sickle, 
1992). This knowledge crosses all of the targeted 
application domains and is the basis of a separate code 
browser that operates independently of the operation shown 
in Figure 1. 

We are also attempting to mechanically pre-instantiate 
domain models by using the data gathered from the 
applications of an EDS entity-relationship-based CASE 
tool that is used by SBUs for data modeling and code 
generation. By analyzing data models, we have access to 
lens of thousands of specific entities, relationships, and 
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constraints which have been used to specify programs and 
are useful for partially instantiating domain models. 

Modeling Considerations 

Models are inevitably abstractions of reality dial capture 
information to achieve specific goals. A domain model 
determines the items of interest that exist in the world and 
sanctions the types of inferences allowed [Liu and Farley, 
1990; Davis, 1991]. A model is die result of conscious 
decisions about what to describe and what to ignore. No 
model is complete or correct in the sense that it is 
applicable to all tasks. 

Domain models in our system are structured to represent 
the type of information diat is used widiin EDS SBUs to 
achieve our operational goals. Although EDS serves a 
wide range of industries, we are not attempting to model 
real-time or other application areas which diverge from 
standard business transaction processing. A general issue 
of interest in diis research is the extent to which any 
particular representalion/model can be mutated to hold 
different types of information for different tasks while still 
effectively achieving die original operational goals. 

One requirement for our models is that they be 
consistent. Domain and specification model consistency is 
maintained by a specialized dieorem prover. The Uieorem 
prover, STR+VE , is an upgraded version of die prover 
presented in (Bledsoe, 1980) for proofs of theorems in 
general inequalities. A TMS is being constructed to 
interface between die modeling system and the theorem 
prover. 

Dynamic Knowledge Structure 

The remainder of this paper presents one aspect of 
domain model representation and gives a glimpse of the 
relationship between specification and domain models and 
the organizadon of domain models. 

While most would agree dial hierarchical organizational 
strategies provide a reasonable way to structure knowledge 
within complex domains, the creation of a hierarchical 
structure, like any type of representational scheme, imposes 
a particular view of die world. Unfortunately, there is no 
particular view that is optimal for every application. 
Although the programs widiin a particular application share 
the same legal, physical, and economic constraints, die 
construction of any particular specification model depends 
upon a set of policy decisions dial determine how cases are 
handled. Furthermore, software in the large systems are 
continually changing in such a manner dial die concept of a 
static hierarchy is insufficient to capture die process of 
system evoludon. 

Consider software systems diat manage die payment of 
health insurance claims. Although conceptually simple, 
these systems handle hundreds of thousands of different 
cases. One way to represent these cases is to enumerate die 
leaf nodes of die hierarchies created by the appropriate 
partitioning of attributes such as gender, age, family_staius, 
previous_condition, employment, deductibles, copayments, 


prognosis, and so on. Unfortunately, the tree structure 
created by case expansion not only obscures relevant and 
interesting cases, but is also a monolidiic structure. A 
paradox of object-oriented approaches is diat well-adapted 
structures are not adaptable to new situations. 

Because of die combinatorial explosion of the leaf 
nodes, it uuikes sense to handle die cases at as high a level 
as possible. Term subsumption systems such as CLASSIC 
(Borgida, et al., 1989) automate this process by 
determining die place in a hierarchy in which terms are 
subsumed. But subsumption systems assume a single 
structure in which all sub-models can belong. In the case 
of applications such as health insurance, individual 
modules may have different hierarchical structures and still 
maintain die integrity and constraint rules of the domain 
model. 

Attribute Definitions 

Aiuibutes are normally considered as data values or slot 
fillers within a ejass or frame. However, the standard 
treatment of attributes as lists of data values with some 
underlying machine representation fails both to capture 
sufficient semantic information from the applicauon 
domain and to state definitions widi sufficient formality to 
allow semandcs-related consistency checks. 

Attributes are functions which define how a set of 
objects is mapped widiin a class. One type of attribute has 
a value set represented by a nominal scale which consists 
of a set of values, T(A) = {C\ y . . . C n }. 

One of die ways that die modeling process maps die 
world into a domain model is by creating categories in 
such a way dial items to be categorized widi respect to a 
particular attribute are as homogeneous as possible widiin a 
category and as heterogeneous as possible between 
categories. Examples of nominal scales abound and map 
cleanly to the nouon of enumerated type as shown below: 

(Colors 

:type nominal_scale 
’.values (Red Yellow Green Blue) 

The next type of attribute is an ordinal scale — a nominal 
scale in which a total ordering exists among die categories. 
Interval and ratio scales are die more quantitative scales 
and add definitions of dimensions, units, and granularity. 

This brief description of attribute type was included to 
allow die reader to understand die examples in this paper. 
Attributes have additional types and a number of other 
properues which are explained in (Iscoe, et al., 1992). 

Hierarchical Decomposition 

Hierarchies are a natural way to view and organize 
information and, at some level of abstraction, are a part of 
most object-oriented and knowledge representation 
languages. Unfortunately, die simplicity of diese concepts 
can sometimes obscure die semantics that a model is 
attempting to capture. That one’s needs dictate one's 
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ontological choice is a fundamental premise of knowledge 
engineering. The ability to systematically define a new set 
of attributes by partitioning the value sets of old attributes 
and then using these new attributes to reclassify the domain 
in accordance with the new requirements is an important 
aspect of our attribute characterization. By preserving die 
"ontological map" as a component of the attribute, the 
domain modeler can shift between the differing paradigms 
modeled by various classes of objects. 

Attribute characterization provides a representation and 
systematic methodology for the partitioning of attributes 
that facilitates die way they are organized, subdivided, and 
built into hierarchies. An attribute restriction is a new 
attribute whose value set and set of applicable relations are 
subsets of the original attribute. 

Creating a new attribute serves the dual purpose of 
creating a set of views on the old attribute as well as 
creating a new attribute. Often, new attributes are defined 
in terms of old attributes by partitioning the original value 
set and then equating each new attribute value widi an 
element of the partition. As an example, an accounts 
receivable (AR) system may use the attribute 
days_to_payment whose value is die average number of 
days it takes for die client to pay a bill. 

(days_to_payment: 

\type ratio_scale 

dimension time 

: unit days 

From the standpoint of AR applications, a more useful 
attribute might be : 

(type_of_payer: 

:type OrdinaLscale 

:Ordered_by lateness_of_payment 

rvalues (pays_on_time slow_pay dead_beaD) 



Figure 3 — Partitioning days_to_payment 


This new attribute will be defined by partitioning the 
value set of days_to_payment by subdividing the range of 
values, then equadng each value with one of die elements 
of the partition as illustrated in figure 3 and described as 
follows: 

(type_of_payer 

:mapped_from days_lo_payment 
(pays_on_time (<=30) 

(slow_pay 

(AND (> 30) (< 90))) 

(dead_beat (>= 90)))) 


Note that die days__to_payment attribute is based on a 
quantitative attribute while tire type_of_payer attribute is 
based on a qualitative attribute. In general, an attribute 
mapping represents a loss of information (in this example, 
die number of days overdue) in return for a more useful 
and inherendy less detailed category. 

Using Population Parameters 

Population parameters are used to help automate the 
process of creating new attributes from old ones. For 
example, some graduate admissions committees use GRE 
scores to separate applicants into acceptance categories. 
Population parameters allow application designers to create 
new attributes based on restrictions to the original attribute 
as shown below: 



Figure 4 — Using Population Parameters to 
Restrict an Attribute 


Figure 4 shows die GRE score as an attribute which could 
be attached to a student. Understanding the distribution of 
values widiin the value set of GRE scores allows 
application designers to create partitions in any one of a 
variety of ways. For example, assume dial an application 
designer wanted to create an initial partition based on the 
requirement M accept all students who score in the top x% 
on the GRE, consider those who score between x% andy % , 
and reject those who score in the bottom y%. " Given this 
type of requirement, die domain model contains the 
appropriate information to use and an algorithm to produce 
die correct raw score numbers to achieve such a partition. 

Another way dial these requirements are sometimes 
stated is to build a partition based on an absolute raw score. 
For example, a requirement like "accept all students who 
score above 1450 on the GRE " is easily displayed and 
modeled. Furdiermore, diis type of specification can be 
used interactively so that the designer can juggle between 
raw scores and percentiles until the partitions appropriate 
for die application domain are produced. 

Domain and Specification Models 

In this section we focus on relations between attributes 
within a single domain model class. For die purposes of 
this discussion we define die following attributes: 

(Name :type idenufier) 

(Gender '.type nominal_sca!e 
:values (male female)) 
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(Eye_color :type nominaLscaie 
rvalues (brown, blue, green)) 

(Benefits :type nominaLscale 
rvalues (Soc_sec, RR, none) 

(Age :type ratio_scale 

idimension (time) 

:unit (year) 

: granularity (1) 

iderived (diff_date cur_date birtli_date) 
(Medicare_paymeni :type ratio_scale 
idimension (money) 

:unit (dollar) 

igranularity (.01)) 

ipopparms ((min l)(max 10000)(mean 225))) 
(Age_m type: ordinid_scale 

rvalues (under65 65_and_over) 

imapped_from age 

(under65 (< 65)) 

(65_and_over (>= 65))) 

Although many other constraints exist, domain model 
classes can be regarded as consisting of sets of attributes 
which are either required or might be included within a 
particular domain model. These constraints are expressed 
as follows: 

mustjhave(c> a J — attribute a must be used in 
class c in a model. 

applicable(c> a) — attribute a can be used in 
class c in a model depending on the choice of 
specification designer. 

condjnuslJuive(c ; a, cond) — attribute a must 
be used in class c in a model if condition cond 
evaluates to true. 

cond_applicable(c t a, cond) — attribute a can be 
used in class c a model if condition cond 
evaluates to true. 

Within any particular specification model, an attribute is 
simply classified as used within a class. 

used(m, c, a) — within model m, attribute a is 
used in class c in model m. 

The most straightforward relationship between a domain 
model and a specification model is that must_have 
attributes are used in all specification models and 
applicable attributes are selected by die specification 
designer. The following rules formalize die semantics of 
the four constraints on die use of attributes within classes 
listed above. 

(1) mustjiave(c, a) — » Vm used(m, c, a) 

(2) applicable^, a) -» 3m used(m, c, a) 

(3) (cond_applicable c a ((pi aj vi)...(p n a n v n ))) 

Vm, object 
[(used m c a) -> 

(used m c a] ) a . . . A(used m c a^) a 


[(instance m c object) a (in (a object) 'Ha)) 

-> (pi (ai object) vj) a ... a 
(p n (a n object) v n )]] 

(4) (cond_musUiave c a ((p i at vi)...(p n a n v n )) 
Vm, object 

[(used in c ai)A...A(used m c a n ) 

-> (used me a) a 
[(pi (ai object) vj) a 

... A 

(p n (a n object) v n ) a (instance m c object) 

— » (in (a object) T<a))]] 

For example, in a domain model, name might be 
required for all specification models, while eye_color could 
be selected only if it were appropriate for the particular 
specification model. 

(person 

imustjiave ((Name ()) 
applicable ((eye_color 0) 

...) 

The application of these constraints when cond is 
vacuously true is a fairly standard feature in most modeling 
languages of this type. However, name and eye_color are 
attributes which are total functions and are not as 
interesting as die cases diat occur when the attributes are 
partial functions. 

Conditions for Function Evaluation 

Recalling that an attribute is a function which maps 
objects to a particular property, cond can be interpreted as 
die condition which must be satisfied for die attribute to be 
a total instead of a partial function. In odier words, cond 
defines the subset which is the domain of applicability of 
the partial function. For example for a person class 
medicare_payment is only applicable if age is 65 or over 
and benefits is none. 

(cond_applicable person Medicare_paymeni 
((= Age_m 65_and_over) (= Benefits none))) 

The domain modeling system is designed so diat the 
conditions required to establish die proper domain for an 
attribute are automatically maintained. These conditions 
are constrained in such a way that tractability is maintained 
and are of the form ((pj aj v;)... (p n a n v n )) , where p { is 
the name of a predicate, a t is the name of an attribute, and 
v, is a value of the attribute. 

A user can create a specification model with any 
particular class hierarchy as long as die domain policies 
and constraints are satisfied. 

We are currently experimenting with ways to capture 
and verify domain modeling constraints by presenting 
redundant information in a variety of ways. We believe 
that many of die specification problems in large systems 
are created when value set changes cause a single case to 
be changed but fail to correct cases diat were identified 
from a previous inference. 


76 


iscoe 


For example, if we assume that Medicare_payment is 
only applicable if age is 65 or over and benefits is none, the 
system can infer that Medicare_payment cannot apply to a 
person who is younger than 65. 

In fact, assume 

(concLapplicable person Medicare_payment 
((= Age_m 65_and_over) (= Benefits none))), 

then 

Vm, object 

((used in person Medicare_payment) — > 

(used m person Age_m)A(used m person Benefits)A 
((instance in person object) a 
(in (Medicare__payment object) [1 10000]) 

-» (= (Age_m object) 65_and_over) a 
(= (Benefits object) none))) (5) 

After Medicare ^payment is used in a model, if user is 
trying to assign a Medicare_payment to a person who is 
younger than 65, using rule (5) will lead to a contradiction. 

A key point is that when people are presented with value 
sets they automatically and unconsciously perform 
substitutions such as die ones listed above. This is a 
reasonable way to build a model until a value set changes. 
In large systems, value sets are frequently changed. 
Consequently, conclusions that were drawn by using 
negation to infer values become invalid. We use the 
applicability of conditions and the system’s knowledge of 
value sets to attempt to provide the proper cases for the 
domain modeler to check when conditions change. 

Discussion 

In this paper, we have presented die concept of modeling 
application domains in order to achieve the operational 
goals of program specification, code generation, and 
reverse engineering. The main concept is dial multiple 
specification models can be created dial are consistent and 
“correct” widi respect to a domain model. Domain models 
represent information about a particular industry area. 
Specification models represent information about a 
particular system. 

The middle oval on the right side of figure 1 represents 
the process of code generation through program 
transformation. Given a specification model, executable 
code can be generated by performing a series of 
correctness-preserving transformations on die specification. 
The goal of diis part of die project, which is not yet active, 
is to produce efficient code dial satisfies the original 
specification. 

Domain and specification models tire constructed by 
using a graphical interface to interactively create a set of 
rules based on attribute value set partitions and the 
preceding axioms. The system is being implemented using 
Motif GUI on SPARC workstations. Although it is 
currently operating in a single user mode, it is being 
designed to be accessed simultaneously by multiple domain 


modelers. We are also trying to accelerate the knowledge 
capture process by reverse engineering data models diat 
have been captured by an existing EDS case tool and 
instantiating diem into die appropriate domain models. 
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